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ABSTRACT 

The  MiTAP  system  was  developed  as  an  experimental 
prototype  using  human  language  teehnologies  for 
monitoring  disease  outbreaks.  The  system  provides  timely, 
multi-lingual,  global  information  aeeess  to  analysts, 
medieal  experts  and  individuals  involved  in  humanitarian 
assistanee.  Thousands  of  artieles  from  eleetronie 
information  sourees  spanning  multiple  languages  are 
automatieally  eaptured,  translated,  tagged,  summarized,  and 
presented  to  users  in  a  variety  of  ways.  Real  users  aeeess 
MiTAP  daily  to  solve  real  problems.  The  sueeessful 
adoption  of  MiTAP  is  attributed  to  its  user-foeused  design 
that  aeeommodates  the  imperfeet  eomponent  teehnologies 
and  allows  users  to  internet  with  the  system  in  familiar 
ways.  We  will  diseuss  the  problem,  design  proeess,  and 
implementation  from  the  perspeetive  of  serviees  provided 
and  how  these  serviees  support  system  eapabilities  that 
satisfy  user  requirements. 
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INTRODUCTION 

Appropriate  response  to  disease  outbreaks  and  emerging 
threats  depends  on  obtaining  reliable  and  up-to-date 
information,  whieh  often  means  monitoring  vast  news 
sourees  in  many  languages  worldwide  -  a  task  analysts 
eannot  feasibly  do.  An  effeetive  solution  requires 
automated  support  for  global  traeking  of  emerging  threats. 
Analysts  report  that  previous  attempts  at  developing  sueh 
tools  have  met  with  failure  and  frustration  beeause  the  tools 
have  often  been  designed  with  little  eonsideration  to  the  end 
user  and  poorly  integrated  into  the  work  environment.  As 
part  of  a  researeh  experiment  on  biologieal  threats,  we  were 
tasked  to  integrate  human  language  teehnologies.  Our  goal 
was  to  make  a  system  that  would  be  useful  to  analysts  and 
also  used  by  analysts.  The  resulting  MiTAP  (2001)  system 
eolleets,  annotates  and  eategorizes  doeuments  from 
multiple  open  news  sourees.  By  automating  these  proeesses 
and  by  providing  multiple  views  into  the  data,  MiTAP 
enables  analysts  to  spend  less  time  on  gathering  and 
digesting  raw  information  and  more  time  on  analysis  tasks. 

Acceptability  through  Accessibility 

If  users  eannot  aeeess  a  system  or  eannot  understand  how  to 
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use  it  the  first  time,  it  is  unlikely  they  will  try  it  a  seeond 
time.  For  that  reason,  we  ehose  familiar,  intuitive,  and 
reliable  interfaees.  Users  ean  aeeess  data  in  MiTAP  through 
a  mail/news  reader  or  via  a  web-based  seareh  engine.  There 
are  advantages  to  providing  aeeess  through  standard  tools: 
there  is  no  need  to  install  eustom  software,  the  instant  sense 
of  familiarity  with  the  interfaee  is  erueial  in  gaining  user 
aeeeptanee  as  little  to  no  training  is  required,  and  data  from 
the  browsers  are  easily  imported  into  other  tools. 

Information  at  a  Glance 

Given  time  eonstraints,  analysts  are  not  always  able  to  read 
entire  doeuments  to  determine  their  relevanee  or  find 
important  faets.  As  a  way  of  providing  overviews  of 
doeument  eontents,  MiTAP  generates  artiele  summaries 
and  draws  attention  to  important  text  within  eaeh  doeument. 
Pop-ups  show  lists  of  named  entities  (i.e.,  loeations  and 
people)  found  in  the  doeument  (Vilain  1999).  These 
summaries  provide  enough  information  to  indieate 
relevanee  to  the  analyst  and  aet  as  shorteuts  to  faet  retrieval. 
Key  words  in  artieles  are  eolor-eoded  in  the  text  so  users 
ean  quiekly  sean  for  relevant  information.  Entities  sueh  as 
people,  organizations,  loeations,  diseases,  and  vietim 
information,  are  highlighted  (Aberdeen  et  al.  1996).  Despite 
the  errorful  tagging,  this  feature  helps  the  user  quiekly 
foeus  on  key  information  in  lengthy  doeuments. 

Information  from  Data 

Thousands  of  artieles  are  eaptured,  proeessed,  and  posted  to 
the  system  daily  -  too  mueh  to  manage  and  read.  To  help 
organize,  summarize,  and  navigate  the  data,  the  MiTAP 
news  server  hosts  several  hundred  newsgroups  organized 
by  eategory  (i.e.,  souree,  disease,  region,  person,  and 
organization)  to  allow  analysts,  with  speeifie  information 
needs,  to  loeate  material  quiekly.  Artieles  are  eross-posted 
to  various  newsgroups,  based  on  information  extraeted  from 
their  eontents.  Supplementing  aeeess  to  the  data,  artieles  are 
indexed  by  an  information  retrieval  system,  allowing  full 
text,  souree-speeifie  queries  over  the  entire  arehive. 
Knowledge  from  Information 

Various  MiTAP  newsgroups  have  been  ereated  to  provide 
high-level  views  of  the  news.  The  multi-doeument 
summarization  feature  (Columbia  Newsblaster  2002) 
automatieally  traeks  events  and  produees  daily  summaries 
or  high-level  views  of  the  underlying  data.  Another 
summary  is  in  the  form  of  daily  Top  10  wateh  lists  of 
diseases  or  speeifie  people  in  the  news  (Alias  I,  Ine.  2002). 
These  views  provide  indieations  of  top  stories,  extraets  of 
relevant  doeuments,  and  tables  of  assoeiated  entities. 
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User-Centered,  Task-Driven  Design  Approach 

The  focus  of  integrating  components  into  MiTAP  was  on 
providing  value  to  end  users.  We  wanted  to  create  a  tool 
that  would  require  no  installation,  little  to  no  training,  and 
fit  naturally  into  the  analysts'  workflow.  We  spent  weeks  in 
dialogue  with  analysts,  understanding  their  requirements, 
and  studying  their  work  practices.  This  direct  contact 
allowed  us  to  understand  the  critical  elements  of  the  task, 

i.e.,  timely  access  to  data,  information  extraction  from  that 
data,  knowledge  representation  of  that  information,  and 
seamless  integration  into  the  work  environment. 

During  development,  ongoing  real  exercises  and  evaluation 
proved  to  be  invaluable  for  measuring  utility,  usability,  and 
progress.  Over  a  ten-day  period,  we  used  the  system  to 
gather  information  on  potential  biological  threats  and  to 
monitor  international  coverage  of  specific  events.  We  were 
able  to  produce  relevant  information  that  analysts,  using 
other  means,  were  not  able  to  find.  The  exercise  also  helped 
improve  the  performance  and  robustness  of  the  system;  we 
were  able  to  improve  the  data  throughput  from  4K  to  8K 
articles  a  day.  We  learned  that  we  needed  to  enhance  search 
capabilities,  integrate  better  tools  for  summarization  and 
data  visualization,  and  provide  customizable  mechanisms  to 
allow  users  to  track  information. 

The  Disease  of  the  Month  Experiment  was  a  series  of 
minievaluations  designed  to  measure  progress  on  a  monthly 
basis.  We  chose  a  scenario  familiar  to  analysts  (i.e., 
research  a  current  disease  outbreak  and  prepare  a  report)  to 
help  minimize  dependent  variables  and  reduce  training. 
Test  groups  were  compared  monthly  to  control  groups  in 
order  to  measure  system  utility.  Comparing  MiTAP  to  the 
web  and  its  vast  amount  of  information,  we  hypothesized 
that  1)  MiTAP  users  can  produce  better  analytic  reports  in  a 
shorter  amount  of  time,  where  “better”  means  more  up-to- 
date  and  more  complete,  and  2)  MiTAP  users  spend  less 
time  reading  documents  and  can  digest  more  in  a  given 
period  of  time.  Test  groups  were  also  compared  across 
iterations  to  measure  the  progress  of  development. 

Simultaneously,  we  performed  independent  usability 
studies.  For  purposes  of  contrasting  and  comparing  test  vs. 
control  and  test  vs.  test  across  months,  we  defined  five 
categories  of  metrics:  efficiency,  task  success,  data  quality, 
user  satisfaction,  and  usability.  In  our  experiments,  MiTAP 
users  provided  more  detail  and  more  up-to-date  information 
on  disease  outbreaks  than  just  the  web  alone;  however,  they 
did  not  necessarily  spend  less  time  doing  so.  Our  results 
also  show  that  the  test  groups  were  able  to  find  a  larger 
number  of  relevant  articles  in  fewer  searches.  In  fact,  the 
test  groups,  who  were  also  permitted  to  use  the  web  to  find 
information,  cited  MiTAP  articles  in  their  reports  an 
average  of  three  times  more  than  articles  found  on  the  web, 
and  often  the  links  to  the  relevant  web  information  were 
found  via  MiTAP.  During  this  experiment,  feedback  has 
guided  development,  provided  a  comprehensive 


understanding  of  what  real  users  do  and  how  we  can  help 
them,  and  improved  overall  system  performance  (e.g., 
throughput  increased  by  a  factor  of  2.5  while  source 
integration  time  decreased  by  a  factor  of  4).  As  a  result  of 
improved  performance,  we  were  able  to  add  many  new 
sources,  producing  a  significantly  richer,  broader,  and 
larger  data  collection. 

In  addition  to  exercises  and  experiments,  we  used  focus 
groups  to  help  design  analytical  tools  from  combinations  of 
integrated  technologies.  User  surveys  of  early  versions  of 
the  integrated  tools,  as  well  as  unprovoked  feedback,  helped 
in  ongoing  improvements.  We  also  examined  the  logs  for 
usage  patterns  to  help  us  understand  how  the  system 
components  were  being  used. 

Real  Users,  Real  Problems 

MiTAP  was  originally  designed  for  a  group  of  medical 
analysts  interested  in  monitoring  infectious  disease 
outbreaks.  However,  the  dynamic  and  flexible  nature  of  the 
system  has  allowed  it  to  become  larger  in  scope, 
encouraging  a  broad  user  base  with  a  variety  of  interests. 
Currently,  over  500  users  have  accounts,  and  the  domain 
has  expanded  beyond  diseases  to  include  weapons  of  mass 
destruction,  terrorism,  and  warfare. 

Conclusion 

MiTAP  has  demonstrated  the  utility  of  integrating  multiple 
research  technologies  to  address  the  requirements  of  a 
community  of  users.  The  end  result  is  a  set  of  in-demand 
tools  that  provide  the  capabilities  to  support  individuals  and 
organizations  who  must  manage  the  information  overload 
resulting  from  the  need  to  keep  current  on  numerous 
worldwide,  multilingual  events.  We  have  learned  that 
building  a  system  is  more  than  just  integrating  components. 
Focus  on  both  user  and  task  is  critical,  and  ongoing 
evaluation  aids  in  iterative  development. 
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